AuthZService: improve authz caching #9

akshayutture-augment · 2025-11-20T00:37:36Z

Test 2

Replicated from ai-code-review-evaluation/grafana-greptile#2

* remove the use of client side cache for in-proc authz client Co-authored-by: Gabriel MABILLE <[email protected]> * add a permission denial cache, fetch perms if not in either of the caches Co-authored-by: Gabriel MABILLE <[email protected]> * Clean up tests Co-authored-by: Ieva <[email protected]> * Cache tests Co-authored-by: Ieva <[email protected]> * Add test to list + cache Co-authored-by: Ieva <[email protected]> * Add outdated cache test Co-authored-by: Ieva <[email protected]> * Re-organize metrics Co-authored-by: Ieva <[email protected]> --------- Co-authored-by: Gabriel MABILLE <[email protected]>

akshayutture-augment · 2025-11-20T00:40:31Z

@claude review this PR

akshayutture-augment · 2025-11-20T01:01:09Z

@claude review this PR

claude · 2025-11-20T01:01:31Z

Claude finished @akshayutture-augment's task —— View job

PR Review Complete

PR: AuthZService: improve authz caching

I've reviewed the changes in this PR which improves caching for the authorization service. Here's my comprehensive review:

Summary of Changes

This PR implements two key caching improvements:

Removes client-side caching for in-process authz clients (using NoopCache)
Adds a permission denial cache to avoid redundant database lookups for denied permissions
Refactors cache checking logic to check cache first, then fall back to database

Positive Aspects ✅

Smart denial caching: The new permDenialCache (lines 116-120 in service.go) prevents repeated database queries for permissions that were already denied. This is a clever optimization for negative results.
Excellent test coverage: The PR adds comprehensive test cases for:
- Cache hit scenarios
- Cache miss/fallback scenarios
- Outdated cache scenarios
- Explicit denial cache behavior
Proper cache key specificity: The userPermDenialCacheKey includes all relevant parameters (namespace, userUID, action, name, parent) to ensure cache correctness (cache.go:30-32).
Backward-compatible refactoring: The logic reorganization maintains the same external behavior while improving performance.

Issues & Concerns ⚠️

1. Potential Cache Inconsistency Issue (High Priority)

Location: service.go:116-121 and service.go:153-156

The denial cache entry is set when allowed == false, but this creates a potential issue:

cachedPerms, err := s.getCachedIdentityPermissions(ctx, checkReq.Namespace, checkReq.IdentityType, checkReq.UserUID, checkReq.Action)
if err == nil {
    allowed, err := s.checkPermission(ctx, cachedPerms, checkReq)
    // If cached perms exist but permission is denied, we return early WITHOUT setting denial cache
    if allowed {
        return &authzv1.CheckResponse{Allowed: allowed}, nil
    }
}
// Denial cache only set here (line 153-156)
if !allowed {
    s.permDenialCache.Set(ctx, permDenialKey, true)
}

Problem: If cached permissions exist but don't grant access, the code returns early at line 122-135 without setting the denial cache. The denial cache is only set at lines 153-156 when fetching fresh permissions from the database. This means:

First check with cached perms → denied, no denial cache set
Second identical check → fetches from cache again, checks perms again (not optimized)

Recommendation: Set the denial cache in both paths when access is denied:

if err == nil {
    allowed, err := s.checkPermission(ctx, cachedPerms, checkReq)
    if err != nil { ... }
    if allowed {
        // Cache hit and allowed
        return &authzv1.CheckResponse{Allowed: allowed}, nil
    } else {
        // Cache hit but denied - set denial cache here too
        s.permDenialCache.Set(ctx, permDenialKey, true)
        return &authzv1.CheckResponse{Allowed: false}, nil
    }
}

2. Missing Cache Invalidation Strategy (Medium Priority)

Location: Throughout caching implementation

The PR doesn't address cache invalidation when:

User permissions are updated
Team memberships change
Roles are modified
Folders are moved/deleted

The denial cache with a 30-second TTL (shortCacheTTL) mitigates this somewhat, but users could experience up to 30 seconds of stale permissions after changes.

Recommendation: Document the cache invalidation strategy or consider implementing cache invalidation hooks for permission changes.

3. NoopCache for In-Process Client (Low Priority)

Location: rbac.go:101-105

The PR changes the in-process RBAC client to use NoopCache instead of the local cache:

rbacClient := authzlib.NewClient(
    channel,
    authzlib.WithCacheClientOption(&NoopCache{}),  // Was: local cache
    authzlib.WithTracerClientOption(tracer),
)

Question: Why disable client-side caching for in-process clients? The commit message says "remove the use of client side cache for in-proc authz client" but doesn't explain the rationale. Is this because:

The server already has caching, making client cache redundant?
Double caching was causing issues?
Memory concerns?

Recommendation: Add a code comment explaining why NoopCache is used for in-process clients.

4. Inconsistent Early Return in Check() Method (Low Priority)

Location: service.go:116-135

The early cache check logic is slightly inconsistent with error handling:

Denial cache check (lines 116-120): Returns immediately on cache hit
Permission cache check (lines 123-135): Returns immediately only if allowed == true

When cachedPerms exist but deny access, the code continues to fetch fresh permissions from the database. This seems intentional (to handle outdated cache), but it means cached denials are treated differently than explicit denial cache entries.

Recommendation: Clarify the intended behavior - should cached denials also be trusted, or is the current "fetch fresh on cache denial" behavior intentional as a safety mechanism?

Minor Issues 🔍

5. Test Removal Without Clear Justification

Location: service_test.go:342-350

A test case was removed:

// REMOVED:
{
    name: "should return permissions from cache if available",
    permissions: []accesscontrol.Permission{
        {Action: "dashboards:read", Scope: "dashboards:uid:some_dashboard"},
    },
    cacheHit:      true,
    expectedPerms: map[string]bool{"dashboards:uid:some_dashboard": true},
},

This test verified cache behavior in getUserPermissions(). After refactoring, the cache check moved to getCachedIdentityPermissions(), which is tested elsewhere. However, removing this test slightly reduces coverage of the full flow.

Recommendation: Consider keeping a similar test or document why it's redundant.

6. Code Cleanup Artifact

Location: rbac.go:160

The function newRBACClient() was removed and its logic inlined, leaving this slightly awkward code structure:

func newRemoteRBACClient(clientCfg *authzClientSettings, tracer trace.Tracer) (authlib.AccessClient, error) {
    // ... connection setup ...
    
    client := authzlib.NewClient(...)  // created but immediately returned
    return client, nil
}

Recommendation: Minor - could directly return the result of authzlib.NewClient(), but current code is fine.

Performance Considerations 📊

Positive Impact:

Denial cache should significantly reduce database load for unauthorized access attempts
Removing redundant client-side cache for in-process clients reduces memory overhead

Potential Concerns:

Cache checking overhead is minimal (map lookups)
The additional denial cache increases memory usage slightly, but with 30s TTL this should be negligible

Code Quality 📝

Strengths:

Clean separation of concerns with getCachedIdentityPermissions()
Good use of metrics for observability
Comprehensive test coverage of caching behavior

Areas for Improvement:

Missing code comments explaining the caching strategy
No documentation of cache TTL choices (why 30s vs 2min?)

Recommendations Summary

Must Address:

⚠️ Fix the cache inconsistency issue where denial cache isn't set when using cached permissions (Unified Storage Performance Optimizations #1 above)

Should Consider:
2. Document cache invalidation strategy
3. Add comment explaining NoopCache usage for in-process clients
4. Clarify the intended behavior for cached permission denials

Nice to Have:
5. Add inline comments explaining the caching strategy
6. Document why specific TTL values were chosen

Verdict

This PR makes solid improvements to caching performance, particularly with the denial cache optimization. The test coverage is excellent. However, there's a potential cache inconsistency issue (point #1) that should be addressed before merging to ensure the denial cache works as intended in all scenarios.

Recommendation: Request changes to address the cache inconsistency issue, then approve.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AuthZService: improve authz caching #9

AuthZService: improve authz caching #9

Uh oh!

akshayutture-augment commented Nov 20, 2025

Uh oh!

akshayutture-augment commented Nov 20, 2025

Uh oh!

akshayutture-augment commented Nov 20, 2025

Uh oh!

claude bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AuthZService: improve authz caching #9

Are you sure you want to change the base?

AuthZService: improve authz caching #9

Uh oh!

Conversation

akshayutture-augment commented Nov 20, 2025

Uh oh!

akshayutture-augment commented Nov 20, 2025

Uh oh!

akshayutture-augment commented Nov 20, 2025

Uh oh!

claude bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Summary of Changes

Positive Aspects ✅

Issues & Concerns ⚠️

1. Potential Cache Inconsistency Issue (High Priority)

2. Missing Cache Invalidation Strategy (Medium Priority)

3. NoopCache for In-Process Client (Low Priority)

4. Inconsistent Early Return in Check() Method (Low Priority)

Minor Issues 🔍

5. Test Removal Without Clear Justification

6. Code Cleanup Artifact

Performance Considerations 📊

Code Quality 📝

Recommendations Summary

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude bot commented Nov 20, 2025 •

edited

Loading